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Abstract —In many practical parameter estimation problems, 
prescreening and parameter selection are performed prior to 
estimation. In this paper, we consider the problem of estimating 
a preselected unknown deterministic parameter chosen from a 
parameter set based on observations according to a predeter¬ 
mined selection rule, The data-based parameter selection 
process may impact the subsequent estimation by introducing 
a selection bias and creating coupling between decoupled pa¬ 
rameters. This paper introduces a post-selection mean squared 
error (PSMSE) criterion as a performance measure. A corre¬ 
sponding Cramer-Rao-type bound on the PSMSE of any 'I'- 
unbiased estimator is derived, where the 'I'-unbiasedness is in 
the Lehmann-unbiasedness sense. The post-selection maximum- 
likelihood (PSML) estimator is presented. It is proved that if there 
exists an 4 1 -unbiased estimator that achieves the 'P-Cramer-Rao 
bound (CRB), i.e., an '['-efficient estimator, then it is produced by 
the PSML estimator. In addition, iterative methods are developed 
for the practical implementation of the PSML estimator. Finally, 
the proposed [[/-CRB and PSML estimator are examined in 
estimation after parameter selection with different distributions. 

Index Terms —Non-Bayesian parameter estimation, 'P-Cramer- 
Rao bound ('l'-CRB), estimation after parameter selection, post¬ 
selection maximum-likelihood (PSML) estimator, Lehmann un¬ 
biasedness. 


I. Introduction 

Statistical inference on multiple parameters often involves 
a preliminary data-driven parameter selection stage. In mathe¬ 
matical statistics literature, estimation after parameter selection 
refers to the problem in which estimation is performed only 
after a specific population, related to a specific parameter has 
been selected from a set of possible independent populations. 
The population selection is based on some predetermined data- 
based selection rule, where 'I' may not be optimal in any 
sense. In cognitive radio communications Q], for example, the 
parameters of a channel are estimated only after the channel 
has been identified in the white space, often thresholding on 
the empirical signal to noise ratio as a selection criterion. In 
medical diagnoses, a special test is administered only after 
other preliminary tests indicate that a patient may have con¬ 
tracted a certain disease. Other applications include multiple 
radar subset selection problems (2), medical experiments 0, 
genetic studies a, and estimation in wireless sensor networks 
after sensor node selection 0. 
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Despite the importance of estimation after parameter selec¬ 
tion, the impact of selection procedure on the fundamental lim¬ 
its of estimation performance for general parametric models 
is not well understood. It is known that the selection process 
affect the statistical properties of the subsequent estimation 
0. In particular, the bias and mean squared error (MSE) 
criterion are inappropriate (e.g. 0, 0) and the conventional 
Cramer-Rao bound (CRB) is unsuited since it does not take the 
prescreening process into account. In addition, the selection 
may create coupling between originally decoupled parameters 
and it usually induces a bias, or “winners curse” 0, on any 
estimator of the selected unknown parameter. For example, for 
the exponential family of distributions, no unbiased estimator 
exists for classical estimation after selection with independent 
population and a single sampling stage and data-based selec¬ 
tion rules 0, 0, iflOl . 

A. Summary of results 

In this work, we are interested in the problem of es¬ 
timation after parameter selection, i.e., estimating a subset 
of parameters after they are selected based on a data-based 
selection rule. This problem is a generalization of the classical 
estimation after selection problem 0, where each parameter is 
associated with a specific non-overlapped set of observations, 
named a population, and the populations are assumed to be 
independent. Another special case of the model considered 
here is the problem of estimation in the presence of nuisance 
parameters on. oa. In such a problem the parameter of 
interest is chosen in advance independent of data. 

In order to characterize the estimation performance of 
the selected parameter, we introduce the post-selection MSE 
(PSMSE) criterion and the concept of 'i'-unbiasedness by 
using the non-Bayesian Lehmann-unbiasedness definition lfl3l . 
Then we develop the appropriate CRB-type bound on the 
PSMSE of any tk-unbiased estimator. In addition, we present 
the post-selection maximum-likelihood (PSML) estimator, 
which is the corresponding maximum-likelihood (ML) esti¬ 
mator for estimation after parameter selection problems. We 
show that if an 'k-unbiased estimator exists that achieves the 
'l'-CRB, it is produced by the PSML estimator. We further 
develop iterative methods for the practical implementation of 
the PSML estimator. Finally, the proposed 'l'-CRB and PSML 
estimator are examined on uniform, exponential, and Gaussian 
distributions with the sample mean selection (SMS) rule. 

B. Related works 

The earliest works on classical estimation after selection 
with independent populations are by Sarkadi, Putter, and 
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Rubinstein in (7) and (8). These works, as well as studies that 
appear in mathematical statistics literature, assume random 
unknown parameters and show that no unbiased estimator 
exists for independent Gaussian populations. In mathematical 
statistics literature, estimation after selection with independent 
populations has received considerable attention over the years, 
where most of the work is restricted to specific parametric 
models, such as the Gaussian a, m, mi, Gamma CD, 
E). and uniform m models. Several estimation methods 
have been proposed to reduce the selection bias by employing 
various iterative methods for bias correction (e.g. US), EQl). 
Shrinkage, minimax, and Bayesian techniques have also been 
studied ED, E2- For cases in which an unbiased estimator 
exists, the U-V estimator by Robbins can be used (23). The 
current paper provides a general non-Bciyesian framework, 
i.e., where the parameters to be estimated after selection 
are deterministic and the underlying statistical models are 
general and admit general dependencies across parameters. In 
particular, we establish the basic theory of T-cfficiency post 
selection estimation that includes the fundamental limits of 
estimation, ways to achieve efficiency when efficient estimator 
exists, and practical approaches. 

In the context of signal processing, the works in 1241 and 
lf25l investigate the Bayesian estimation after the detection 
of an unknown data region of interest. The problem of post¬ 
detection estimation, or estimation after data censoring, is 
considered by Chaumette, Larzabal, and Forster [26j|, [27], 
who derive a novel CRB on the conditional MSE, involving 
conditional Fisher information. It should be noted that in (26), 
ED- El, E3- the selection rule selects the data to be used, 
while in our proposed model the parameter to be estimated is 
selected and all the data can be used for estimation. Selection 
and ranking are highly related approaches H- Detection and 
estimation after ranking and order statistics procedures are 
proposed in (28) . |29l and are shown to have both practical 
and theoretical advantages in terms of computational com¬ 
plexity and performance. An empirical Bayes estimator for 
exponentially distributed populations is proposed in [30]. For 
the problem of estimation after model selection, a bootstrap 
method for computing standard errors and confidence inter¬ 
vals is considered in ED- a post-selection lasso method is 
developed in l32l . and the CRB is derived in (33l for model 
order selection. However, it should be emphasized that in the 
case of estimation after parameter selection presented here, 
the measurement model is assumed to be known and there 
are no modeling errors. In contrast, in estimation after model 
selection ed, m, the measurement model is unknown and 
is selected from a finite collection of competing models. 

C. Organization and notations 

The remainder of the paper is organized as follows: Sec¬ 
tion HD presents the mathematical model for the problem of 
estimation after parameter selection. The T-unbiasedness in 
the Lehmann sense and the T-CRB are derived in Section 
mil and estimation methods are developed in Section IIVI for 
estimation after parameter selection. Finally, the proposed 'f- 
CRB and T-unbiased estimators are evaluated via simulations 


for the linear Gaussian model in Section [V] Our conclusions 
appear in Section [VI] 

In the rest of this paper, vectors are denoted by boldface 
lowercase letters and matrices by boldface uppercase letters. 
The operators (-) T and (•) — 1 denote the transpose and inverse, 
respectively. The vector e m £ IK A1 is a vector of all zeros 
except for a 1 at the mth position. Mm = 1,..., M, and the 
(to, fc)th element of the matrix A is denoted by [A] m fc. The 
notations S rn j- and I 4 denote the Kronecker delta function 
and the indicator function of an event A, respectively. The 
TOth element of the gradient vector Vgc is given by 
where 0 = [ 6 1 ,..., 9m] t , c is an arbitrary scalar function of 9, 
VqC = (Vgc) T , and V|c — VeVgC. The notations E#[-] and 
Eg[-|A] represent the expected and conditional-expected value 
of its argument, parameterized by a deterministic parameter 9 
and given event A. 

II. Problem formulation 

Let (f 'l Xl T,Pg) denote a probability space, where O x 
is the observation space, T is the a-algebra on L x , and 
{Pg} is a family of probability measures parametrized by 
the deterministic parameter vector 9 = \6 \,..., 6m\ t € R 7 ' 7 . 
Let 6 = [ 61 ,... ,&m] t be an estimator of 9, based on a 
random observation vector x £ fi x , i.e., 9 : fl x —> E A1 . For 
each probability measure Pg, the function /(x; 9) denotes the 
corresponding probability density function (pdf) of x. All the 
estimators in this paper are assumed to be in the Hilbert space 
of absolutely square integrable functions with respect to (w.r.t.) 

Pg , C 2 (9). 

The basic structure of the proposed model for estimation 
after parameter selection consists of two stages: first, a pa¬ 
rameter 9 m is selected according to a predetermined data- 
driven selection rule, 'L, and then, this parameter is estimated. 
In this work, we assume that the selection rule T is given 
and we focus on the estimation of the selected parameter. 
The proposed model is presented schematically in Fig. □ The 
extension for a selection of a subset of unknown parameters, 
i.e., the multiparameter case, is discussed in Section UlI-DI 

A data-based selection rule is a deterministic function 
T 1 : S2 X —>- {1 that selects a parameter based on 

the observation vector, x. That is, if 'L(x) = to, then the 
estimation goal is to estimate the parameter 9 m based on the 
same observation vector x. We assume that the deterministic 
sets Am = {x : x £ fl x ,\I/(x) = to}, to = 1 
partition fl x . For the sake of simplicity of notation, in the 
following 4/(x) is replaced by 4/. By using Bayes’ rule it can 
be seen that the pdf of the observation conditioned on the 
event x £ A m is 

/(x |* = m! 0) = 5 -ifcA_, Vx€A„ (1) 

and is undefined otherwise, where Pr(T' = to; 9) denotes the 
probability of this event for all to = 1,..., M. 



Fig. 1. Schematic model of estimation after parameter selection. 
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A special case of the proposed model of estimation after 
parameter selection is estimation in the presence of additional 
undesired deterministic nuisance parameters fTTl , lfl2l . Here, 
the selection of the desired parameter is performed in ad¬ 
vance, independently of the data, x. Therefore, the statistical 
characteristics, such as CRB and bias, are not affected by the 
selection process and are equal to those of the multiparameter 
estimation, in which the nuisance parameters are estimated as 

well HQ, Il34j . 

A more challenging and relevant application of the proposed 
model is the classical estimation after selection with indepen¬ 
dent populations G), ®, which is presented schematically in 
Fig. [2] In estimation after selection with independent popula¬ 
tions, a given set of M independent populations is assumed. 
These populations might represent, for example, a set of M 
different communication channels. For any m = 1,..., M, it 
is supposed that N m > 1 random observations are drawn from 
the mth population to generate the mth observation vector, 
ym = [ 3 /m[0],, y m [N m - 1]] T , with the associated marginal 
pdf, / m (y m ;9 ra ), in which 9 m £ R denotes the unknown 
parameter related to the mth population. In this case, the 
observation vector is given by x = [y^,..., y m] T > the joint 
pdf of the M populations is /(x;0) = rim=i 9m), 

and the selection of a parameter 9 m is equivalent to the 
selection of the mth population or channel. 



Fig. 2. Schematic model of classical estimation after selection with 
independent populations. 


In this work, we are interested in the parameter estimation 
of the unknown deterministic vector 9 , where only estimation 
errors of the selected parameter are taken into consideration 
and the selection rule is predetermined. Therefore, for a given 
selection rule, T, we use the following post-selection squared- 
error (PSSE) cost 0, 0, El, (35): 

M 

C ( * } ( 0 , 9)=J2 0m - 0m) 2 l{*=m}. (2) 

m= 1 


The corresponding PSMSE is given by 


E e 




= E e 


M 


Y. 0m - d m ) 2 l{<a = 


—m} 


M 


= ^ Pr(tt = m; 9)Eg 0 rn - 0 m ) 2 |T- = 


771=1 


(3) 


where 0 is calculated by using the densities /(x; 9) and 
/(x|H/ = m; 9) for the first and the second terms, respectively. 


The component-wise PSMSE of a specific parameter is defined 
as 


Ee 


0n 



T = m 


m = 1,..., M. 


The use of the indicator functions implies that the PSMSE 
may be equal for two estimators that have different values 
with a nonzero probability, i.e., outside the subset indicated 
by these indicators. In fact, 9 rn affects the PSMSE only for 
observations x £ Am¬ 
in the mathematical statistics literature, the unknown param¬ 
eter for estimation after selection with independent populations 
is usually defined as which has both 

random and deterministic components. In this work, we are 
interested in the estimation of the deterministic parameter 9. 
The notion of non-Bayesian estimation allows the derivation of 
the corresponding CRB-type lower bound and non-Bayesian 
estimation methods. 


III. The Cramer-Rao-type bound for estimation 

AFTER PARAMETER SELECTION 

The CRB (e.g. ifTTI . l36l ) provides a lower bound on the 
MSE of any mean-unbiased estimator and is used as a bench¬ 
mark to study the optimality of practical parametric estimators. 
In this section, a CRB-type lower bound for estimation after 
parameter selection is derived. The proposed bound is a lower 
bound on the PSMSE of any Lehmann unbiased estimator, as 
described in the following. 


A. \17 -unbiasedness 


The mean-unbiasedness constraint is commonly used in 
non-Bayesian parameter estimation in. However, a mean- 
unbiased estimator is inappropriate for estimation after param¬ 
eter selection problems, since we are interested only in errors 
of the selected parameter (See, e.g. | 6 | , ifTOl ). Lehmann fl3l 
proposed a generalization of the unbiasedness concept, which 
is based on the considered cost function. In this section, the 
general Lehmann unbiasedness is used to define the unbiased¬ 
ness for estimation after parameter selection problems. 

Definition 1: The estimator 9 is said to be an unbiased 
estimator of 9 in the Lehmann sense m w.r.t. the scalar 
nonnegative cost function £7(0,0) if 


Ee 


C(9, V ) 


> Eg 


C(0,0) , V 77 ,9 £ fig, (4) 


where fig is the parameter space. 

The Lehmann unbiasedness definition implies that an es¬ 
timator is unbiased if, on average, it is “closer” to the true 
parameter 9 than to any other value in the parameter space. 
The measure of “closeness” between the estimator and the 
parameter is the mean of the cost function, C(9,9). For 
example, it is shown in El that under the squared-error 
cost function the Lehmann unbiasedness in 0 is reduced to 
the conventional mean-unbiasedness, E g[9] = 9, V0 £ fig. 
Additional examples for Lehmann unbiasedness with different 
cost functions can be found, for example, in lfl3) . (37), and 
(38) . The following proposition describes the Lehmann unbi¬ 
asedness for the estimation after parameter selection, named 
3k -unbiasedness. 
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Proposition 1: An estimator 0 : V. x —> R u is an unbiased 
estimator of 6 G R j1/ in the Lehmann sense w.r.t. the PSSE 
cost and the selection rule \P iff 


for a given selection rule, 'P, with a finite second moment. 
Then, the PSMSE is bounded by the following Cramer-Rao- 
type lower bound: 


E g 


0 m - Om) l{*=m} =0, Vm = 1, . . . , M, WG G 


i>M 


(5) 


Eg 


cW(M)l > #>(«), 


(9) 


or, equivalently. 


E e 




we g r m 


( 6 ) 


for all m = 1 ,..., M such that Pr(\P = m;G) 0 . 

Proof: The proof appears in Appendix A. 

It can be seen that the Lehmann unbiasedness definition in 
(0 and © is a function of the given selection rule. Therefore, 
in the following, an estimator 6 is said to be an ^-unbiased es¬ 
timate of Q for the selection rule *P, if ([5} (or, equivalently, 0) 
is satisfied. The concept of risk-unbiased in the Lehmann sense 
for the classical estimation after selection with independent 
populations has been discussed in the literature for the random 
parameter J2m=i and various cost functions (e.g. 

HD and BOl ). 


B. The W>-CRB 

Obtaining an estimator with the minimum PSMSE among 
all ’P-unbiased estimators is usually not tractable and a 
uniform 'P-unbiased minimum PSMSE estimator may not 
exist 0 . Thus, lower bounds on the performance of any \P- 
unbiased estimator are useful for performance analysis and 
system design. In the following, a new version of the CRB 
for estimation after parameter selection is derived. It should 
be noticed that, in general, the minimum PSMSE estimator 
is not unique since only the estimation errors of the selected 
parameter are taken into consideration. 

Let us define the following post-selection Fisher informa¬ 
tion matrix (PSFIM) of the mth component: 

Jm(0,*) = Eg [Vo log/(x|'P = to; 0) 

xVj log/(x|<P = m;0)\ = m] , (7) 

for all m = 1,..., M. In addition, we define the following 
conditions that are a modified form of the well-known CRB 
regularity conditions (e.g. Il36l . pp. 440-441). 

C. l) The post-selection likelihood gradient vector, 

V e log/(x|vP = m;0), exists and is finite for 

any G G R M , x G A m , and Wm — 1 That 

is, we assume that the mth PSFIM, J m (fL'P), is a 
well-defined, nonsingular, and nonzero matrix for any 
G G R M and Vm = 1,..., M. 

C.l) The operations of integration w.r.t. x and differentiation 
w.r.t. G can be interchanged as follows: 



V e (g(x, G)f(x |>P = m; G)) dx 

= VgEg [g(x, 0)|>P = m], ( 8 ) 


for any 6 G M M and for any differentiable and measur¬ 
able function g(x.,G). 

Theorem 1: ( ’P-CRB) Let the regularity conditions 1C. 1 llC2l 
be satisfied and G be an 'P-unbiased estimator of G G R M 


where 


M 


B { *\0) = ^Pr(^ = m; 0 )[J- 1 ( 0 ,T-)] mm ,(lO) 

771 = 1 

and J m (0,'P) is the PSFIM defined in 0. Furthermore, 
the component-wise 'P-CRB on the PSMSE of a specific 
parameter is given by 

E e [0m - e m ) 2 \Wt = m] > (11) 

for all m = 1,..., M. The equality holds in 0 and (UTb iff 
there exist functions h m (G), m = 1 ,..., M, such that 


1 = 1 


hm(.0){e m -e m ), Vm = 1,..., M (12) 

almost surely (a.s.) x G A m . 

Proof: According to the Cauchy-Schwarz inequality: 

F r 2 | , T . -i ^ V 2 g [ri(x,e)d(x,e)\y = m] 

E e [ v (x,0 )|vp_mj> Efl[d2(Xj0) |^ = m] . (13) 

for any measurable functions r]{x, G) and h(x, 6) with finite 
second moments. By substituting r](x, G) = 0 m — 0m and 


M 


d(x,0) = 


91og/(x|^ = m; 0) 


1=1 


gei *>],,, 


in (fl3l) and under Condition lC.il one obtains 


Eg 


0 m - OmY 


4p = m 


> 


M 


Zai,m(G) 
1 = 1 


M M 

E E [J m(e,n k ,i 0A\G.n im 0 

i=i fe=i 


(14) 


for any estimator with E g 
nonsingular PSFIM, where 


(Om - Omf 


*P = m 


A 


— Eg 


<91og/(x|'P = to; G) 

dO, 


0m — 0 m ) 


< oo and a 


'P = TO 


for all to,/ = 1 According to the Cauchy-Schwarz 

conditions, the equality in (IT4l> holds iff (IT 2 l > is satisfied for 
to G {1 ,,M}. By using integration by parts and assuming 
Condition 1C. 21 it can be verified that 


“b Em — (15) 


d [ - 

^l,m(0) = „ Eg (0 m 0 m ) ’P = TO 

OUl L 

for all to, l = 1,..., M, where the last equality is obtained 
by using the 'P-unbiascdness conditions from 0. In addition, 
it can be verified that 

M M 


/=1 k=l 




(16) 
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By substituting ([T5| > and ( IT6t in m, we obtain the 
component-wise T'-CRBs on the PSMSE in (fill . Then, by 
multiplying (ITTl) by Pr(T' = to; 0 ) and taking the sum of over 
to = 1,, M, we obtain the T-CRB in (0. Furthermore, the 
equality condition in (IT2l > stems from the equality conditions 
of <m. ■ 

The following Lemma presents two alternative formulations 
of the PSFIM that are based on the (unconditional) likelihood 
function and the probability of selection, instead of the condi¬ 
tional likelihood used in ((7J- These formulations can be more 
tractable for further estimation and sampling procedures. 

Lemma 1: Assuming that Conditions 1C. 1 llG2l are satisfied, 
the second derivative w.r.t. 0 of /(xlT = m;6) exists and 
is bounded and continuous Vx G Am , and the integral 
j. /(x|\& = to; 0 ) dx is twice differentiable under the 
integral sign for all to = 1,..., M, G G R m . Then, the mth 
PSFIM in CD) satisfies 

Jm(0,’k) = Ee [ Ve log /(x; 0)Vg log /(x; 0) | 'P = to] 

—Vg logPr(T' = to; 0)Vg logPr('k = to.; 0) (17) 

and 

J m (0,1')= - E e [V|log/(x;0)| T = to] 

+ VelogPi^ = to;0 ), (18) 

for all to = 1,..., M, 0 G Ift M , and for any selection rule T. 
Proof: The proof appears in Appendix B. 

Similar to the T-CRB from Theorem [Q by using the 
Cauchy-Schwarz inequality and the ^'-unbiasedness we can 
obtain various non-Bayesian bounds on the PSMSE These 
bounds are modifications of various non-Bayesian bounds, 
such as the biased CRB and Barankin-type bounds ED, E2, 

m. 


C. Special cases 

In this section, we demonstrate the proposed T-CRB and 
T-unbiasedness for different cases. 

1) Randomized selection rule: The randomized, coin¬ 
flipping selection rule satisfies Pr(\l/ ran d = to; 0) = p m , for all 
to = 1,..., M, where {p m } G [0,1] M are independent of x. 
Therefore, the T-unbiascdness from © in this case is given 
by 


E e 



= 0 , 


V0 G M M , 


(19) 


for all to = 1,... ,M with p m ^ 0. The ^-unbiasedness in 
< n~9] > is the classical mean-unbiasedness definition. In this case, 
the T'-CRB from ( I Kil l is reduced to 


M 

#-) ( 0 )=^h [B( 0 )]mml ( 20 ) 

m =1 

where B(0) = J -1 (0) is the conventional CRB. Thus, the 
proposed T-CRB for the randomized selection rule, T' ran d, 
is equal to a weighted sum of the diagonal elements of the 
classical CRB, B(0), for estimating 0 without a selection 
stage. In particular, for p m = 8 myTn i, where 9 m r is the 
desired parameter, we obtained an estimation problem in the 
presence of nuisance parameters, i.e., where the selection of 


the “parameter of interest” 0 m < is performed in advance. It is 
easy to verify that in this case the T-CRB and T-unbiascdness 
are reduced to their classical, marginal versions. This result co¬ 
incide with the literature on non-Bayesian nuisance parameter 
estimation (e.g. ED and El). 

2 ) Parameter coupling: For conventional parameter estima¬ 
tion with a diagonal FIM, where the FIM is defined as 

J(0) = E e [V„ log /(x; 0) log /(x; 0)] , (21) 


the unknown parameters are decoupled from each other; that 
is, knowledge of one parameter does not affect the accuracy in 
the estimation of the others. This situation occurs, for example, 
for classical estimation after selection with independent pop¬ 
ulations, in which /(x; 0) = rim=i However, it 

should be noted that the PSFIMs are not necessarily diagonal 
for diagonal FIM cases, since the selection step may create 
dependency and coupling between the parameters over the 
different populations. For example, by using the form of the 
PSFIM in (fl 8 l >. it can be seen that the matrix Vg logPr('T = 
to; 0 ) may be a nondiagonal matrix for a data-dependent 
selection rule. 

3) Biased'S!-CRB: Similar to the proof of TheoremQ] it can 
be shown that under regularity conditions 1C. 1 !lC2l the PSMSE 
is bounded by the following biased T'-CRB: 


M 


E e 


C w (0,0) > ^ Pr(T' = to;0) 


m= 1 


X (V e 6 m (0) + e m y TO] (VeM0) + e m ), (22) 


for any T-biased estimator, 0 , with the biases 


M0) 


Ee 




m = 1,..., M, 


and a finite second moment. 


D. Estimation after parameter subset selection 

In many problems, we are interested in selecting a subset 
of parameters and then, estimating the parameters of the 
selected subset El, El- This subset may be of random size, 
with/without overlapping between the subspaces. The selection 
rule is {\Di,..., 'I'l}, where {'Ti,..., ’Ll} is a 

finite covering of the set {1 ,,M}, i.e., it is a division of 
{1 ,... ,M} as a union of possibly-overlapping non-empty L 
subsets, such as the power set. In this case, the PSSE cost 
function from Q is replaced by 

M 

cW(0,0) = ^(i-g 2 i {m6 * } 

ra= 1 


and the corresponding PSMSE is: 

L 


Eg 


C ( *>(0,0) = ^Pr(T- = Tb;0) 


1 = 1 


M 


x ^e[(0 m -0 m ) 2 \^! = 'S> 1 
















IV. Post-selection estimation 
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Similar to Proposition Q] and Theorem [Q the di-unbiasedness 
and 'I'-CRB for subset selection are, respectively, given by 


A. The PSML estimator 


E e 




Vm = 1,..., M, 
Vl = l = l,...,L 


for any 9 G U M and 


For general parameter estimation, the commonly used ML 
; estimator is defined as 


0 {ML) 


argmaxlog/(x; 9). 

0 


(27) 


E e 




where 

L M 

S W (0) = E Pr(« = *i;fl) E [Jr 1 (^^)] ro , ro (24) 

1=1 

and 

J 1 ( 0 ,*) = Ee [V» log/(x|* = $,;») 

xVj log/(x|T' = ty; 0 )| ’L = 'I';] ■ 

for all l = 1,..., L is the PSFIM for this case. 

If the selection rule selects all the M parameters, then, the 
PSMSE is equal to the MSE and we obtain the conventional 
parameter estimation problem, mean-unbiasedness, and the 
well known CRB. Another special case of estimation after 
parameter subset selection is the classical estimation after 
selection model with independent populations, where the pdf 
of each single population is a function of multiple unknown 
parameters. For this nonoverlapped case, we can also obtain 
a matrix-form of the d'-CRB from d24l > by using the matrix 
form of the Cauchy-Schwarz inequality and the vector 
unbiasedness from (l23l >. 


E. Estimation after data censoring 

A related problem is the estimation after data censoring, 
which is obtained from the aforementioned model by assuming 
a selection rule that restricts the set of observations available 
for parameter estimation. In this case, we use the observations 
only if 'P = 1, and we remove them otherwise. Similar to the 
derivation of the 'I'-CRB in Theorem Q] the following matrix- 
form 'I'-CRB is obtained for this case: 


E o 


{G~G)(9-6) T |tf=l 


(25) 


where 

J c (0,*) = Eg [Ve log/(x|'T = 1 ; 6) 

xV£log/(x|tf = l;0)|tf = 1], (26) 


The bound in (l25l > coincides with the conditional CRB derived 
in |[26l for estimation after binary detection, i.e., when a 
binary detection step is performed before the estimation of 
the parameters. This observation remains valid for any non- 
Bayesian bound on the PSMSE, which can be derived by using 
the Cauchy-Schwarz inequality and the T-unbiasedness, in a 
similar way to the derivations in Il26l . (43] ■ However, it should 
be noted that in estimation after data censoring, the selection 
rule selects the data , while in our model the parameter to be 
estimated has been selected. 


It is well known that the ML estimator is inappropriate for 
estimation after parameter selection since it does not take into 
account the prescreening process (7), ( 8 ). Inspired by Theorem 
U] we define the PSML estimator as: 

- (psml) [ M \ 

0 PSML = argmax<^ E lo g/( x l^ = m;9)l {x&Am} > 

lm=l J 

= arg max {log /(x; 6) 

0 

M 

- E lo g Pl '(^ = TO;0 ) 1 {xeA m } p (28) 

m= 1 J 

where the last equality is obtained by using 0 . We propose 
using the PSML estimator instead of the ML estimator for 
estimation after parameter selection problems. The PSML esti¬ 
mator can be interpreted as the “penalized ML estimator” (361 . 
where the penalty term in this case is — i logPr(T' = 
to; 0)l{ xg _ 4 m }. However, since the penalty term is not a 
probability density w.r.t. 9, ( 128 b does not have a Bayesian 
interpretation. It can be seen that if the selection probability, 
Pr(T = to; 6), is not a function of 9, then the PSML estimator 
coincides with the ML estimator. This situation occurs, for 
example, for a randomized selection rule and for estimation in 
the presence of nuisance parameters. Under suitable regularity 
conditions, such as differentiability, the PSML estimator is a 
solution to the following score equation 

M 

E Ve lo S /( x l^ = m: 0 ) 1 {xeA m } = 0. (29) 

m= 1 

The '[•'-efficient estimator is defined as follows. 

Definition 2: An estimator is said to be an '['-efficient 
estimator of 9 if it is an '['-unbiased estimator that achieves 
the 'T-CRB. 

It should be noticed that the requirement for equality 
condition in (1 1 2b . i.e., for the 'l'-CRB achievability, is relevant 
only in the subspace A rn and the estimation errors outside this 
region can have arbitrary values. Thus, the estimator which 
satisfies the equality condition in (fl 2 l i. if exists, is not unique, 
since by changing this estimator outside the subset A m we 
obtain a new estimator that attained (TT2l . In particular, the 
'['-efficient estimator is not unique. The following theorem 
describes the relations between the PSML and the '['-efficient 
estimators. 

Theorem 2: Assume that the regularity conditions IC.1IIC.2I 

- (f-eff) 

are satisfied and that 9 is an '['-efficient estimator, as 
defined in Definition [2] Then, 

0 (f- eff ) = 9 ^, Vm = 1,..., M, Vx € Am a.s.. (30) 

Proof: According to Definition |2] 9 is an f- 
unbiased estimator that achieves the 'I'-CRB. According to 
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m, the estimator that achieves the T-CRB satisfies 

^(vt-eff) = 


M 


1 ^log/(x|^ = rn\0) ^ 


hm( 0 ) 


1=1 


ddi 


hm{ 0 ) 


a.s. Vx € A m , for all to = 1,..., M and 9 £ fig. By using 
it can be concluded that 


31og/(x|'I' = to; 9) 


d9 l 


= 0, VI = 1, ... ,M, (32) 


- (psml) --- 

for x E Am- Therefore, by substituting 6 = 6 and (132b 
in (EB, one obtains 

0 ( m efl) = d ( r L) , Vto = 1, . . . , M, X e Am. 


Thus, (130) is satisfied. ■ 

It should be noted that (f30b implies that for any observation 
vector x £ A m , the mth elements of the ^/-efficient estimator 
and the PSML estimator are identical, while the other elements 
may be different. However, these elements have no influence 
on the PSMSE. 


3) MBP: In some instances, the second derivative of 
the post-selection likelihood function is intractable, so that 
calculation of H ” 1 (0, 4') and J“ 1 (0, T') may be difficult. 
Thus, the Newton-Raphson and post-selection Fisher scoring 
methods are intractable. In l46l . an MBP algorithm is proposed 
that strategically selects a part of the full likelihood function 
with easily computed second-order derivatives. The remaining 
more difficult part of the likelihood function participates in 
the algorithm in such a way that its second-order derivative 
is not needed. If the information dominance condition l46l 
is satisfied, then the MBP estimator converges to the MSPL 
estimator and its asymptotic performance is better than that of 
the ML estimator. 

In the context of the estimation after parameter selection 
model, according to the r.h.s. of (l28l) . the PSML consists 
of maximizing the sum of two functions: log /(x: 9) and 
-Em=i lo g Pr (^ = m;9)l { x p_ 4 m i. Maximizing the proba¬ 
bility of selection is usually less tractable than minimizing the 
likelihood function and may create dependency and coupling 
between the different parameters. Thus, we use the MBP 
method (46), such that the ith iteration is given by 


B. Practical implementations of the PSML 

In practice, an analytical expression for the PSML in (l29b is 
usually unavailable due to the intractability of the probability 
of selection. In the following, we propose three iterative meth¬ 
ods for the implementation of the PSML: 1) Newton-Raphson, 
2) post-selection Fisher scoring, and 3) maximization by parts 
(MBP). These methods are based on the assumptions that the 
post-selection likelihood, /(x|\I/ = to; 9), is a twice continu¬ 
ously differentiable function w.r.t. 9 for any to = 1,... ,M, 
and that a unique solution to the score equation in (|29) exists, 

p-( psml) 

which is the PSML estimator, 6 

1) Newton-Raphson: The Newton-Raphson method for 
solving the post-selection likelihood equation in ( l29l ) is based 
on replacing the objective function on the r.h.s. of ( l28l > by 
its first-order Taylor expansion (see, e.g. Chapter 7 of fill ). 
Therefore, the ith iteration of the Newton-Raphson method is 
given by: 

e {i+l) =e {j) ~Y h-(* w *)x 

m= 1 

V e log/(x|T' =m;9 )| 0= - w ( 33 ) 

Vi = 1, 2,..., where the Hessian matrix is given by 

H m (0, T') = Vg log /(x|'T = to; 9), \/9 £ M. (34) 

2) Post-selection Fisher scoring: Similar to the derivation 
of Fisher scoring for ML estimation lUD . a variation on 
( l33l > is the Fisher scoring method in which the post-selection 
Hessian is replaced by its expected value, — J m (0,'k). Thus, 
the ith iteration of the resulting post-selection Fisher scoring 
procedure is given by: 

« i<+i >=s“» + i: j-> («“>, *)x 

m= 1 

V 0 log/(x|'T = m;0)[ e=d w 1 { xe ^ m }> (35) 
for i = 1,2,..., where the PSFIM is defined in ([7). 


Ve log/(x; 0 )| e _g(<+i) = 

M 

Vg logPr(T f = m;0)\ g ^i) 1 { x&Am } (36) 

m= 1 

~ (0) - (ml) 

and the initial estimate is the ML estimator, i.e., 9 =9 

The advantage of the estimation iteration in (f36l> is that there is 
no need for a second derivative of the probability of selection. 

Asymptotically as N — > oo, the MBP iteration converges 
to the PSML estimator under the “information dominance 
condition” (46) , i.e., if 

11 J _ 1 (0)Ve logPr(T' = to; 0)Vg logPr(T' = to; 9 )\| < 1, 

and the Fisher information is larger than the information 
contained in the probability of selection, where || • || denotes 
the spectral norm. 

Additional relaxation can be achieved by using the Newton- 
Raphson method on the l.h.s. of (l36l >. i.e., by E): 



M 

Y v e lo S/( x l^ = m ;0)le = e< ‘) !{xg A m }, (37) 

m =1 

or by using the Fisher scoring variation: 

0 (i+1) =0 (i) +J-i (0 W ) x 

M 

Y v e lo g/( x l^ = m;9)\ g=i) ( *) l{xG^l m }) (38) 

m= 1 

where the Hessian matrix in this case is given by H(0) = 
log f(x: 9). One merit of the iterative methods in ( 1371 
and m is that the MBP utilizes the conventional Hessian 
and FIM to direct the search for the PSML. This is very 
useful for the classical estimation after selection problem 
with independent populations, where the conventional Hessian 






and Fisher scoring are diagonal matrices. In Appendix C, 
an iterative method is proposed for cases with intractable 
probability of selection. 


V. Examples 

A. Uniform distribution 

Consider the following observation model: 


ym[n] ~ U[Q,9 m ], n = 0, ... ,N - 1 , to = 1 , 2 , (39) 


where U[a, b] denotes the continuous uniform distribution on 
the support [a, b] and the two populations are assumed to 
be independent. For the selection of the population with the 
largest maximum, the SMS rule, 'f sms, selects the population 
with the largest sufficient statistics, i.e., 

’I' sms = arg max { 9 ^) , (40) 

where the ML estimator of 9 m is given by 

^m L) = max {y m [n]}, m=l,2. (41) 

n=0,...,iV—1 


The uniform minimum variance unbiased (MVU) estimator 
(in the conventional sense) for this problem and without a 
selection stage, is given by (e.g. HD, P- H5) 


ff(MVu) ^ N + 1 
m yy m 5 


TO = 1, 2. 


(42) 


While the ML and MVU are TsMS-biased estimators for this 
case, it is shown analytically in fl 8 l that the U-V estimator. 



for to, k = 1,2 and to ^ k, satisfies 


Efl 




-{’t , SMS=m} 


= 0 . 


(43) 


Thus, according to 01, the U-V estimator is an 'Tsms- unbiased 
estimator. Surprisingly, this estimator is a function of the 
sufficient statistics of the two independent populations. In this 
case, the regularity conditions of the likelihood function are 
not satisfied (e.g. OH, pp. 113-116); thus, the proposed T- 
CRB for any selection rule T, as well as the classical CRB 
itself, do not exist. 

The PSMSE of the ML, MVU, and U-V estimators with the 
SMS rule is evaluated using 250, 000 Monte-Carlo simulations 
and presented in Fig. [3 for 9± = 10 and 9 2 = 10.2. It can 

^ (u-v) 

be seen that an 'L SMS-unbiased estimator exists, 9 , with a 

lower PSMSE than the MMSE of the ML and MVU estimators 
for any number of samples, N. 


B. Linear Gaussian model 


Consider the following observation model: 


f Vi [n] =d 1 +wi [n] 
\ 2 / 2 [raj = 02 + w 2 [n] 


n = 0,..., N — 1, 


(44) 



Fig. 3. The performance of the ML, MVU, and U-V estimators for estimation 
after parameter selection with independent uniformly distributed populations 
and the SMS rule. 


where the normally distributed noise vectors, w[n] = 
[tt?i [n], u> 2 [ra]] T , n = 0,..., N— 1, are independent in time and 
space and have a zero mean and a known covariance matrix, 

erf 0 
0 a 2 2 _ • 

We assume the SMS rule, which selects the population with 
the largest sample mean, i.e., 

^sms = arg max j 0^ L) j , (45) 

where 

JV-l 

^ ML) = x E VmH (46) 

71=0 

is the ML estimator of 9 m for to = 1, 2. According to ( l46l ). 
the ML estimators are jointly Gaussian random variables with 
means 9 1 , ()■> and covariance matrix -jyS. Thus, for the SMS 
rule in (l45l) . the probability of selecting population to is: 

P^ms = m; 6) = Pr (9^ - 9^ > 0; 6) = $(A m ), (47) 



for to, k = 1 , 2 , to 7 ^ k, where $(•) denotes the standard 
normal cumulative distribution function (cdf), 

9m 


A m = 


to , k = 1 , 2 , to 7 ^ k, 


and o = ——• 

1) The sms-CRB: It can be verified that for this case 

V(j log/(x; 6) = —iVAU 1 . (48) 

Therefore, by substituting (l48l ) in ( IT 8 l >. one obtains 

J m (0, U/) = iVS ^ 1 + Vg logPr(T' = to; 6), to = 1,2, (49) 

for the selection rule H/. By substituting gg in @, the 
proposed T-CRB is obtained. Therefore, by using < 147b . the 
chain, and the product rules, it can be verified that 

c(A m ) 


log Pr('I'sMS = to; 6) = 


for all m = 1 , 2 , where 


1 -1 

-1 1 


(50) 


c(A) = _^1 a - 
V ; $(A) $ 2 (A) 


(51) 
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and </>(•) denotes the standard normal pdf. By substituting ( l50l > 
in m, one obtains 


J™(^) 


1 

N 



c( A m ) 

No 2 (c( A m ) +1) 



(52) 


where 


D 


A 


4 2 2 

g\ -o{o$ 

2 2 4 


(53) 


By substituting (l52l) in dTH . we obtain the 'I'sms-CRB on the 
component-wise PSMSE: 


Eg 


(dm-O m ) 2 \* SMS = m >-^C A 


TV 


m ’ 2 i 2 

erf + erf 


, (54) 


m,k = 1,2 and to ^ k , where 


A 


C(A,k) = 1 — 


c(A) 


(55) 


c(A) + 1 

Finally, by substituting d47l > and (l54t in (1 1 Oi l, the 'I'sms-CRB 


on the PSMSE is obtained: 

ji 


-/V \ erf + ( 


^2 


+ "/v $ (A 2 )C ( a 2 


erf + On 


(56) 


Similarly, the biased 'I'-CRB is obtained by substituting (l52l > 
and the gradient of the ML ^-biased from (l57l >- (l58l i in ( 1221 ). 

It is well known that the conventional CRB, on estimating 

the mth parameter 0 rn without a selection stage, is given 

2 

by (e.g. OH pp. 31-32) ^p-. Therefore, the component-wise 
'I'sms-CRB in ( l54t is equal to the conventional marginal CRB 
multiplied by a correction factor, ( (A, <n, <72), as defined in 
(l55l >. The correction factor is presented in Fig. |4] versus A 
and erf for N = 10 and cr 2 = 1. It can be seen that for 
A < 0, the correction factor increases as A decreases, because 
this situation occurs when the order-relation of the sample 
means is wrong, which is an indication of high estimation 
error. Similarly, the correction factor increases as the variance 
erf increases. In contrast, for c(A) = 0, which occurs when 
A 0, the correction factor satisfies £(A,<ri, cr 2 ) —> 1 
and thus, the component-wise 'I'sms-CRB converges to the 
CRB, i.e., the selection stage has only minor influence on the 
estimation stage. It should be noted, however, that the CRB 
and 'I'sms-CRB are lower bounds on different performance 
measures, i.e., MSE and PSMSE, and on different groups of 
estimators, i.e., mean-unbiased and 4'SMS-unbiased estimators, 
respectively. 

2) Estimation: In m it is shown that there is no ’J'sms- 
unbiased estimator of 0 \ and 0 2 . It is also shown that the ML 
estimator satisfies 0 


Eg 




'I'sms = tn 


0 (A m) 

No <&(A m ) - 


(57) 


and 


E g 


Q (“H-) 


Ok 


(I'sms = m 


0 (A m) „ 

No 4>(A m ) - ’ 


(58) 


m,k = 1,2, to ^ k. This result indicates that the ML tends 
to overestimate the parameter of the selected population and 



to underestimate the unknown parameter of the unselected 
population. 

By using the model in (l44l > and the selection probability 
in (l47l >. we obtain the following post-selection likelihood 
function for the SMS rule: 


V e log/(x|^sMS = rn;0) = TVS’ 1 (0 (ML) - 0 ) 

0( A m ) l{xe.4i} — 1 {xgX 2 } 
(J$(A m ) [ 1{xG. 4 2 } - l{xeAi} 


(59) 


where 6 = is defined in (l46l >. According to 

d29l) . by equating the r.h.s. of d59l > to zero we obtain the PSML 
estimator for x G A m : 


^(psml) _ ^(ml) 

a l (ifxG^ti} “ 1 {xG-4 2 }) 

°2 (l{xS^t 2 } ~ IfxG^li}) 


1 </>(A)n, ') 

No *('a (psml) 'i 


(60) 


where 


a( psml ) 


/J(psml) ^(psml) 


a 


to, k = 1,2 , to 7 ^ k , 


for any x G A m . It can be seen that as o 2 increases the 
correction term on the r.h.s. of (l60b becomes insignificant and 
the PSML estimator approaches the ML estimator. 

The solution of (f60t can be found by an exhaustive search 

- (psml) 

over 0 or by using the iterative methods from Section 
IIV-BI It can be verified that the Newton-Raphson and post¬ 
selection Fisher scoring coincide in this case, where the 
/th iteration of the Newton-Raphson PSML (NR-PSML) is 
obtained by substituting ( l59l > and (l52l > in (f35l . Similarly, by 
substituting (l47l >. ( |4Sl ). and (f50t in (l36l> . the MBP estimator is 
obtained: 

^(i+l) _ ^(ml) 


^(A (i ^) o\ — l{ xg _4 2 }| 

TVct$(AW) . °2 (l{xe^l 2 } ~ l{xeAi}) 


, (61) 


which coincides with the results in ll48l for the Gaussian case. 

The bias and PSMSE of the ML, NR-PSML, and MBP 
estimators with the SMS rule are evaluated using 20,000 
Monte-Carlo simulations, and the results are presented in Figs. 
0 and [6] respectively, for 9i = 0, 0 2 = 0.1, o\ = 1, and 
erf = 0.1. The PSMSE performance is compared to the '!'- 
CRB from (l56l > and the biased 'I'-CRB. It can be seen that 
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the PSML methods have a lower 'I'-hi as and lower PSMSE 
than the ML estimator and that the MBP estimator has the 
best performance in both terms. Since no 'I'-unbiased estimator 
exists, the 'I'sms-CRB is higher than the actual PSMSE values, 
but the biased 'T-CRB is a valid bound for any N. However, 
it can be seen that the Tsms-CRB gives an indication of the 
performance behavior and that asymptotically it is attained by 
the PSML estimator and coincides with the biased T-CRB. 



is the ML estimator of 9 m for m = 1,2. The probability 
of selecting population m via the SMS rule is given by the 
negative binomial cdf ED: 

Pr^sMS =m-,d)=Y / ( N+J j ~ 1 ) Ci 1 - Qm) j (65) 
j=o \ J y 


for any 9 m ,9 k > 0 , where q m = e ° r + 6k ■ 

1) Estimation: The ML estimator from (l64l) is an \I/sms- 
biased estimator in this case, since El 


Ee 


fl( ML ) _ n 
O'-L-n '-'n 


d'sMS = to 


— 6m^n 


( 66 ) 


for any 9 m , 9 k > 0, where to, k = 1 , 2 , to ^ k and 
1 


A 

— 


2N — 1 
N 


Pr(d/sMS = m; 0) 

In addition, by using ( l 66 t it can be verified that 


C(l-9m) ■ 


E g 


9 k 9k 


SMS = TO 


— 6k^m 


(67) 


Fig. 5. The ’I 1 sms -bias of the ML, NR-PSML, and MBP estimators for 
estimation after parameter selection with independent Gaussian distributed 
populations and the SMS rule. 



_ "(NR-PSML) 

— -a — h (MBP) 

-o- ■ S< ML > 

-•-T-CRB 

- CRB 

Biased M'-CRB 


Fig. 6. The PSMSE of the ML, NR-PSML, and MBP estimators and the 
'P-CRB for estimation after parameter selection with independent Gaussian 
distributed populations and the SMS rule. 


for any 9 m ,9 k > 0, where to, fc = 1,2, to ^ k. The ^sms- 
biases in ( l 66 t and ( 1671 ) are positive and negative, respectively, 
thus they tend to overestimate the parameter of the selected 
population and underestimate that of the unselected one. As an 
alternative to the ML estimator, the following U-V estimator 
is proposed in fl 6 l : 


fl( u - v ) _ fl(ML) _ 


S(ml)\ 


(^ L) ) 


)" 

to , k = 

N —1 ’ 

(^■ v) 

— 


to ^ k. ( 68 ) 


It is also shown that Eg [6m — 9 m J 1 {^> SMS =m> = 0 and 
thus, according to (0, the U-V estimator is an TsMS-unbiased 
estimator. 

In the following, we derive the PSML estimator for \I/sms = 
to. The results for (I/sms = k, k ^ m are straightforward. For 
the sake of simplicity, the elements of 6 are reordered such 
that the first element is the selected one, i.e., 9 = 9 k ] T . 
The PSML estimator from ( l28l > maximizes the post-selection 
likelihood, which is given in this case by 


log/(x(d 'sms = to; 9) 


C. Exponential distribution 

Consider the following observation model: 


fm[ym[n]', 9 m ) — | q 


1 — - 
e 


0 < ym[n] 
otherwise 


to = 1 , 2 , (62) 


= —N log 9 m — N log 9 k — 

' N ~ x ' N + j- 1 
3 


N9 . 


(ml) 


N9 


(ml) 


6k 



C (1 -q m y (69) 


for all n = 0,..., N — 1, where the parameters 9 m > 0 for f° r an y 6 m ,6k > 0 and x G A„,■ By using (|69]l, it can be 

all to = 1,2 are unknown. The populations are assumed to verified that the gradient vector of /(x|d/sMS = to; 9) w.r.t. 9 

be independent. In this problem, the SMS rule selects the ' s gi yen by 
population with the largest sample-mean, i.e.. 


where 


d'sMS = arg max ( 9^ l) ) , 

m= 1,2 L ) 

(63) 

V e log /(x I^sms = to; 9) = 

N-l 

0. L ML) = »"*[«] 

n= 0 


for any 9 m ,9k > 0 , to, k = 

(64) 

f(qm) = (1 - 


SS^-Om+Omfiqm) 

-51- 

-Qk-Qkf(qm) 

~SI 


3 (ML) 


, (70) 


( 71 ) 
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and 


h(^Qrn') 


A 


„AT -2 f N + j 
^ j=0 l 3 

Pl'C^SMS 


9 m (! -qmV 

to; 0 ) 


, to = 1,2. (72) 


According to < [29b . by equating the r.h.s. of (f70t to zero we 
obtain the PSML estimator for x £ A m - 


A(psml) 

Um 


g(ML) 


1 

^(psml) 


g(ML) 

L U k 


L 1 +/(?m) J 


(73) 


/ \ ^ v^r oivil y 

where = - (PSML T - <PSM l) and x £ A m . Equation £73]) implies 

“m '®k 

q (PSML) Q (PSML) 

that the ratios U ML) and t ML) are only functions of the 

6rn 

g(ML) 

statistic -tvttt- That is, the PSML estimator is a function of 
the ML estimator multiplied by a correction factor, which is 
a function of the ML estimators’ ratio. 

The solution to J73l > can be found for the general case by an 

- (psml) 

exhaustive search over 6 or by using the iterative methods 
from Section lTV-BI Lor example, by substituting (l65b . (l48i ). and 
(l50l > in ( 1361 ), the / th iteration of the MBP method is obtained. 

2) 'i’sMS-CRB: The PSLIM can be obtained by using the 
derivative of d70l >. applying the expectation operator, and using 
(l66l > and (f6Tb , Then, the Tsms-CRB is obtained by substituting 
the PSLIM in (flOt . The explicit ’I'sms-CRB is omitted from 
this paper due to space limitation. 

3) 'k sMS-efficiency: Lor the special case of N = 1, it can 
be shown that the ’T sms -CRB is given by 


3.5 
3 

2.5 
2 

1.5 
1 

0.5 


- 0.5 


-o- e (ML) 

□ 0<PSML) 




( 0° 0 


0 OO®°‘ 


| 0 00 ° G 




, 0 ° 


, 0 ° 


0 □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□a 
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Fig. 7. The ’TsMS-bias of the ML and PSML estimators for estimation after 
parameter selection with independent exponential distributed populations and 
the 



Fig. 8. The PSMSE of the ML and PSML estimators and the 'I'-CRB for 
estimation after parameter selection with independent exponential distributed 
populations and the SMS rule. 


VI. Conclusion 


B^ SMS )( 0 ) 


ol+ ol . 

Om + Ok 


(74) 


The PSML estimator in this case is given by 


n (psml) 
T'm 


Vm[ 0] - V k[ 0] 

A(psml) 

Vm 


fnl !/m[0]-!/fc[0] 

L yk[U ly m {0\-2y k [Q] J 


(75) 


for any y m [ 0] > y k [ 0] and y m [ 0] ^ 2y k [0]. Lor y k [ 0] > y m [ 0]. 
we can change the roles of 6 rn and 0 k to obtain the PSML 
estimator. It can be seen that for N = 1, the PSML and U- 
V estimators from (f75l> and ( 168 b . respectively, of the selected 
parameter coincides, i.e., dm ML ' = Om^ ■ Thus, the PSML 
estimator is an 'Tsms- unbiased estimator for N = 1. In 

~ (psml) 

addition, we can verify analytically that the PSMSE of 6 

_ __ - (psml) 

attains the 'I'sms-CRB from (T74l >. Thus, 6 is an T'sms- 
efficient estimator for N = 1. 

- (ml) p- (psml) 

The PSMSE performance of the estimators 0 and 6 
with the SMS rule are evaluated using 100, 000 Monte-Carlo 
simulations and are compared with the 'I'sms-CRB for TV = 1 
and 9i = 5. The results are presented in Pigs. [7] and [ 8 ] It can 
be seen that the PSML estimator is an 'l'-unbiased estimator 
and has a lower PSMSE than the ML estimator. Moreover, the 

- (psml) 

'I'sms-CRB is achievable by 9 , which is an T'sms -efficient 

estimator in this case. 


In this paper, the concept of non-Bayesian estimation after 
parameter selection is introduced and the '['-unbiasedness in 
the Lehmann sense is defined, for arbitrary data-driven param¬ 
eter selection rules. We derive a Cramer-Rao-type bound for 
the selected deterministic parameters. Unlike the conventional 
CRB, the proposed ’T-CRB provides a valid bound in estima¬ 
tion after parameter selection problems. The PSML estimator 
is proposed and its properties and practical implementations 
aspects are discussed. In particular, it is proved that if there 
exists an '['-efficient estimator, then it is produced by the 
PSML estimator. The new paradigm opens a wide range 
of interesting directions, such as multistage procedures that 
involve active learning and sequential data sampling. 


Appendix A: Proof of Proposition Q] 

In this Appendix, we prove Proposition Q] in a similar way to 
the proof of mean-unbiasedness under a conventional squared 
error cost function (Page 11 in 133). By substituting the 
PSSE cost function from (0 in © the Lehmann-unbiasedness 
condition is given by 


E g 


M 


^ ' (dm 9rn) 


m} 


> 


Eo 


M 


^ ^ {Pm $m) 


.m= 1 


V0,77ER m , (76) 
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where 77 = [ 771 ,, tjm] t is an arbitrary vector. The condition 
in © can be rewritten as 


Ee 


M 


^ ^ {Pm @m + @m T]m) l{\k=m} 


.m— 1 


> Eg 


M 


y ( 0 m — 0 m) 2 l{^= 


=m} 


, V0, r] £ K M . (77) 


By using the linearity of the expectation operator and the fact 
that 0 and t] are deterministic vectors, it can be verified that 
© is equivalent to 


M 


^ ^ iPm Ifm)EjQ (Orn ^m)l{'J7=ra} 

m=l 

M 

> - ( 0 m - ?7m ) 2 Pl'(^ = to; 0 ) 


m —1 


(78) 


V 0 ,r 7 £ where we used Eg [l{^=m}] = Pr(\D = to;0). 

Sufficient condition - It can be verified that if (0 holds, 
then the inequality in f78l > holds since the r.h.s. of < l78b is 
nonpositive. 

Necessary condition - The necessity of ([5} is proven by 
using specific choices of r/. By substituting rj m = 9 m , for all 
to = 1,, M, m ^ /, and rii = 9i+ £ in (f78l >. we obtain the 
following necessary condition: 


sEg 


01 - 0;)l{ 3 , == q 


> —e 2 Pr(\l/ = l: 9) 


(79) 


for any 0 £ R M , e £ R. Since e can be either positive or 
negative and Pr(T' = Z; 6) > 0, the condition in ( [79b implies 
(|3 for any l = l,... ,M. In addition, since 


Ee[(0m - 0m)l{*=m}] = E g[9 m - 9 m I'L = to] Pr(’T = to; 9), 

to = 1,..., M. then, for any m = 1 ,... ,M such that Pr('I' = 
to; 9) ^ 0, the condition in <j5j is equivalent to ©. 


Appendix B. Proof of LemmaQ] 

In this Appendix, two alternative formulations of the PSFIM 
are derived. Similar derivations can be found in |[26ll in the 
context of post-detection estimation. By using dT}, one obtains 

V 0 log f (x|\k = to;0) 

= V e log/(x; 0) ^V e PrOT = to; 0), (80) 

Vx £ Am , and by substituting (l80t in (|7}, one obtains 

J m(9,V) = 

E e [V e log /(x;0)Vg log/(x;0)| T = to] 

-S/g log Pr(^ = to; 9)E e [ log /(x; 9) | = to] 

—Eg [ V g log /(x; 0)1 ^ = to] Ve logPr^ = to; 9) 

+Ve logPi^T' = to; 0)Vq logPr(\p = to; 9). (81) 


Since 7l m is independent of 9 and by using regularity condi¬ 
tion 1C.21 it can be noticed that 


S7g log Pr(T' = to; 9) = 


S/g Pr)'? = m ; 9) 


Pr(d/ = to; 9) 

s/g fA m /( x ; 0 ) dx _ §A m v e/( x ; 0 ) dx 

Pr('T = m;0) Pi-('T = to;0) 

= Eg [Ve log/(x; 0 )| 'L = to] . 


(82) 


Substitution of (l82l > in (I8TI) results in (fl7l> . 

In addition, under the assumption that the integral 
J. /(xIT = to; 9) dx can be twice differentiated under the 
integral sign, it is known that (Lemma 2.5.3 in (36]): 


Eg [Ve log /(x|'T = to; 0)| T = to] = 0 

for any 0 £ R 7 ' 7 . Therefore, by using the product rule twice 
we obtain 


J m (0,'T) = E 0 [ Ve log/(x|\l/ = to; 0) 

x Vj log/(x|\P = to; 0) | d/ = to] 

= VgEg [S/g log/(x|'T = to;0)| 'T = to] 

—Eg [ V# log/(x|d> = m; 0)| d> = to] 

= —Eg [ V# log /(x|d/ = to; 0) j d/ = to] . (83) 

By substituting (l80t in (f83l) . we obtain (fl~ 8 ] >. 


Appendix C: Numerical PSML estimation method 

In some instances, the post-selection likelihood function and 
its gradient are intractable, so that finding the PSML, even by 
the iterative methods in (l37l > and (l38l >. may be difficult. In these 

~(i) 

cases, we can use the previous estimator 0 to construct a 
nonparametric estimator of g m (0) = S/g log Lrf'I' = to; 0) at 

~(i) 

9 = 9 from simulated realizations of the observation model 
and then, to substitute them in (l37l > or (l38l) . The resulting 
iterative PSML (IPSML) algorithm is described in Table 0 
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